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Abstract. We study the connection between a system of many independent Brownian particles 
on one hand and the deterministic diffusion equation on the other. For a fixed time step h > 0, 
a large-deviations rate functional Jjj characterizes the behaviour of the particle system at t = h 
in terms of the initial distribution at t = 0. For the diffusion equation, a single step in the 
time-discretized entropy- Wasserstein gradient flow is characterized by the minimization of a 
functional if^. We establish a new connection between these systems by proving that and 
Kh are equal up to second order in h as — >■ 0. 

This result gives a microscopic explanation of the origin of the entropy- Wasserstein gradient 
flow formulation of the diffusion equation. Simultaneously, the limit passage presented here 
gives a physically natural description of the underlying particle system by describing it as an 
entropic gradient flow. 

Key words and phrases: Stochastic particle systems, generalized gradient flows, varia- 
tional evolution equations, hydrodynamic limits, optimal transport. Gamma-convergence. 



1.1. Particle-to-continuum limits. In 1905, Einstein showed |Ein05| how the bombardment of 
a particle by surrounding fluid molecules leads to behaviour that is described by the macroscopic 
diffusion equation (in one dimension) 



There are now many well-established derivations of continuum equations from stochastic particle 
models, both formal and rigorous [DMP92|. IKL99] . 

In this paper we investigate a new method to connect some stochastic particle systems with their 
upscaled deterministic evolution equations, in situations where these equations can be formulated 
as gradient flows. This method is based on a connection between two concepts: large-deviations 
rate functionals associated with stochastic processes on one hand, and gradient-flow formulations 
of deterministic differential equations on the other. We explain these below. 

The paper is organized around a simple example: the empirical measure of a family of n 
Brownian particles X''' (t) G M, t > 0, has a limit as n — oo, which is characterized by equation ([l]). 
The natural variables to compare are the empirical measure of the position at time t, i.e. = 
'^"^ Sr=i <^x(*)(t)i which describes the density of particles, and the solution p{-,t) of ([T]). We take 
a time-discrete point of view and consider time points t = and t = h > 0. 

Large-deviations principles. A large-deviations principle characterizes the fluctuation behaviour 
of a stochastic process. We consider the behaviour of L'^ under the condition of a given initial 
distribution L° w po G A^i(II^), where Aii{M.) is the space of probability measures on M. A 
large-deviations result expresses the probability of finding Ljj close to some p ^ A4i (M) as 



observing a given realization p: large values of imply small probability. Rigorous statements 
are given below. 



1. Introduction 



dtp = d^xp for {x, t) e M X M 



(1) 
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Gradient flow-formulations of parabolic PDEs. An equation such as ([T]) characterizes an 
evohition in a state space X ^ which in this case we can take — A^i(M) = L^(]R). A 

gradient-flow formulation of the equation is an equivalent formulation with a specific structure. 
It employs two quantities, a functional E: ^ ^ M. and a dissipation metric d: <f?^ x ^ — > M. 
Equation ([T]) can be written as the gradient flow of the entropy functional E{p) — J p\og pdx 
with respect to the Wasserstein metric d (again, see below for precise statements). We shall use 
the following property: the solution t p{t, •) of ([l]) can be approximated by the time-discrete 
sequence {p"} deflned recursively by 

p" e argminX,(p K^ip ; p'^-') ^d{p, p'^-'f + E{p) - E{p^^-'). (3) 



pea: 



2h 



Connecting large deviations with gradient flows. The results of this paper are illustrated 
in the diagram below. 

discrete-time this paper ^ discrete-time variational 

rate functional Jh Cjamma-convcrgcncc formulation Kh 

/i-j-O 

large-deviations principle 

(4) 

i i 

„ . , . , ^ continuum limit . . 171 

Brownian particle system > continuum equation flip 



The lower level of this diagram is the classical connection: in the limit n oo, the empirical 
measure t i— converges to the solution p of equation Q. In the left-hand column the large- 
deviations principle mentioned above connects the particle system with the rate functional Jh- 
The right-hand column is the formulation of equation ([!]) as a gradient flow, in the sense that 
the time-discrete approximations constructed by successive minimization of Kf^ converge to ([l]) as 
/i -> 0. 

Both fimctionals Jh and Kh describe a single time step of length h: Jh characterizes the 
fluctuations of the particle system after time h, and Kh characterizes a single time step of length h 
in the time-discrete approximation of ([T]). In this paper we make a new connection, a Gamma- 
convergence result relating Jh to Kh, indicated by the top arrow. It is this last connection that is 
the main mathematical result of this paper. 

This result is interesting for a number of reasons. First, it places the entropy- Wasserstein 
gradient-flow formulation of ([ij in the context of large deviations for a system of Brownian parti- 
cles. In this sense it gives a microscopic justification of the coupling between the entropy functional 
and the Wasserstein metric, as it occurs in Q. Secondly, it shows that Kh not only characterizes 
the deterministic evolution via its minimizer, but also the fiuctuation behaviour via the connection 
to Jh- Finally, it suggests a principle that may be much more widely valid, in which gradient-flow 
formulations have an intimate connection with large-deviations rate functionals associated with 
stochastic particle systems. 

The structure of this paper is as follows. We first introduce the specific system of this paper and 
formulate the existing large-deviations result ([2]). In Section [S] we discuss the abstract gradient- 
flow structure and recall the definition of the Wasserstein metric. Section |4] gives the central result, 
and Section [5] provides a discussion of the background and relevance. Finally the two parts of the 
proof of the main result, the upper and lower bounds, are given in Sections [7] and |8] 

Throughout this paper, measure-theoretical notions such as absolute continuity are with respect 
to the Lebesgue measure, unless indicated otherwise. By abuse of notation, we will often identify 
a measure with its Lebesgue density. 



2. Microscopic model and Large-Deviations Principle 

Equation ([I]) arises as the hydrodynamic limit of a wide variety of particle systems. In this 
paper we consider the simplest of these, which is a collection of n independently moving Brownian 
particles. A Brownian particle is a particle whose position in M is given by a Wiener process, for 
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which the probabihty of a particle moving from a; e M to y G M in time /i > is given by the 
probabihty density 

Alternatively, this corresponds to the Brownian bridge measure for the n random elements in the 
space of all continuous functions [0, h] i— ^ K. We work with Brownian motions having generator 
A instead of |A, and we write P^; for the probability measure under which X = X'-^^ starts from 
x€R. 

We now specify our system of Brownian particles. Fix a measure po G Ali(M) which will serve 
as the initial distribution of the n Brownian motions . . . , X'"' in R. For each n S N, we let 
(^*'')i=i,...,n be a collection of independent Brownian motions, whose distribution is given by the 
product P„ = {^"^1 Ppn, where Pp^ = po{dx)Vx is the probability measure under which X — 
starts with initial distribution po- 

It follows from the definition of the Wiener process and the law of large numbers that the 
empirical measure Ljj, the random probability measure in A^i(K) defined by 

1 " 

i=l 

converges in probability to the solution p of ([T]) with initial datum po- In this sense the equation ([I]) 
is the many-particle limit of the Brownian-particle system. Here and in the rest of this paper the 
convergence ^ is the weak-* or weak convergence for probability measures, defined by the duality 
with the set of continuous and bounded functions Cf,(M). 

Large-deviations principles are given for many empirical measures of the n Brownian motions 
under the product measure P„ . Of particular interest to us is the empirical measure for the pair of 
the initial and terminal position for a given time horizon [0, h], that is, the empirical pair measure 

1 " 

Note that the empirical measures L° and Ljj are the first and second marginals of y„. 
The relative entropy H : A^i(IR x M)-^ — ^ [0,oo] is the functional 

f / f{x,y)\ogf{x,y)p{d{x,y)) iiq^pj^'^ 
H{q\p) := < KxR 

I +O0 otherwise. 

For given po,p & A^i(M) denote by 

r(po, p) = {qe Mi{R X M) : ttoQ = po, ^iq = p} (6) 

the set of pair measures whose first marginal 'KQq{d-) := ^^q(d-,dy) equals po £^nd whose second 
marginal ■Kiq{d-) := J^q{dx,d-) equals p. For a given (5 > we denote by Bs = Bs{po) the open 
ball with radius S > around po with respect to the Levy metric on A^i(IR) |DS89[ Sec. 3.2]. 

Theorem 1 (Conditional large deviations). Fix 5 > and po e A^i(IR). The sequence 
(P„ o [L'^)^^)neN satisfies under the condition that € Bs{po) a large deviations principle on 
with rate n and rate function 

JhAp-^Po)-^ inf H{q\qQ), p e Mi{R), (7) 

q: -iroq£Bs{po),irxq=p 

where 

qo{dx,dy) := po{dx)ph{x,y)dy. (8) 

This means that 

(1) For each open O C Mi{M.), 

liminf-logP„(Lj', eO\Lle Bs{po)) > - inf Jm(/o; Po)- 

n-Kx n p£0 
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(2) For each closed K C Mi{R), 

limsup - logP„(L,'j e I e Bsipa)) < - inf po)- 

n^oo n peK 

A proof of this standard result can be given by an argument along the following lines. First, 
note that 



■ Po 



o (o-o,a-/0 ^{x,y) = pn(dx)f'x{X{h) e dy)/dy = po{dx)phix,y)dy =: qo{dx,dy), x,y G 



where fj^ : C{[0, h];R) — M, w w(s) is the projection of any path uj to its position at time s > 0. 
By Sanov's Theorem, the sequence (P„oy^^)„gm of the empirical pair measures satisfies a large- 
deviations principle on A^i(IRxIR) with speed n and rate function q i— >■ H{q \ qg), q e A^i(MxM), see 
e.g. |dH00[[Csi84j ). Secondly, the contraction principle (e.g., IdHOOl Sec. III.5]) shows that the pair 
of marginals (L°,Ljj) = {noYn,TriYn) of F„ satisfies a large deviations principle on A^i(M) xA^i(IR) 
with rate n and rate function 

{Po,p)^ inf H{q\qo), 

geAli(RxK): ■!TQq=pQ,Triq=p 

for any P G A^i(M). Thirdly, as in the first step, it follows that the empirical measure under 
P„ satisfies a large deviations principle on 7Mi(M) with rate n and rate function po i— >■ H{po \ po), 
for Po e 7Wi(M). 

Therefore for a subset A C Mi (M) , 

- logP„(L,;5 e A I L^, e Bs) = - logP„(L^ e A, L° e B5) - - logP„(X° e Bs) 

n n n 

- inf iJ(g|(jo) - Jnf iJ(po Ipo)- 

q: TToqeBs,TTiqeA P0G-B5 

Since po £ Bs, the latter infimum equals zero, and the claim of Theorem [l] follows. 

We now consider the limit of the rate functional as the radius (5 — > 0. Two notions of convergence 
are appropriate, that of pointwise convergence and Gamma convergence. 

Lemma 2. Fix po G A4i{R). As 6 ]. 0, Jh.si ■ Po) converges in A^i(M) both in the pointwise and 
in the Gamma sense to 

Jh{p; Po) ■■= inf H{q \ qo). 

q: TXoq=pa,T!iq=p 

Gamma convergence means here that 

(1) (Lower bound) For each sequence p^ p in A^i(M), 

\\mmi J h^s{p^;po)> J h{p\ Po), (9) 

(2) (Recovery sequence) For each p G A4i(M), there exists a sequence {p^) C A^i(M) with 
p^ ^ p such that 

lim Jh,5{P^ ; Po) = Jh{p ; Po)- (10) 

Proof. Jh.si ' i Po) is an increasing sequence of convex functionals on A4i{R); therefore it converges 
at each fixed p G A^i(M). The Gamma-convergence then follows from, e.g., |DM93[ Prop. 5.4] 
or |BraQ2l Rem. 1.40]. □ 

Remark. Leonard |Leo07j proves a similar statement, where he replaces the ball Bs{po) in 
Theorem [l] by an explicit sequence po.n Po- The rate functional that he obtains is again J^. 

Summarizing, the combination of Theorera^ and Lemma [2] forms a rigorous version of the 
statement ([2]). The parameter S in Theorem |lj should be thought of as an artificial parameter, 
introduced to make the large-deviations statement non-singular, and which is eliminated by the 
Gamma-limit of Lemma |2] 
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3. Gradient flows 

Let us briefly recall the concept of a gradient flow, starting with flows in M'*. The gradient flow 
in M'* of a functional £^ : M'' — > M is the evolution in M'* given by 

x\t) = -d,E{x{t)) (11) 

which can be written in a geometrically more correct way as 

x\t) - -g^^d,E{x{t)). (12) 

The metric tensor g converts the covector held VE into a vector field that can be assigned to x. In 
the case of (111 we have g^^ = 5*-', the Euclidean metric, and for a general Riemannian manifold 
with metric tensor g, equation ( 12 ) defines the gradient flow of E with respect to g. 

In recent years this concept has been generalized to general metric spaces [AGS05| . This gen- 
eralization is partly driven by the fact, flrst observed by Jordan, Kinderlehrer, and Otto |JK097[ 
IJK098| . that many parabolic evolution equations of a diffusive type can be written as gradient 
flows in a space of measures with respect to the Wasserstein metric. The Wasserstein distance is 
defined on the set of probability measures with finite second moments, 

^2(]R) Ip e Xi(K) : / x^p{dx) <oo 
and is given by 

d{po,Pif-= inf / {x-y)^j{d{x,y)), (13) 
7er{po,pi) jRxB 

where r(po, Pi) is defined in 

Examples of parabolic equations that can be written as a gradient flow of some energy E with 
respect to the Wasserstein distance are 

• The diffusion equation ([ij; this is the gradient flow of the (negative) entropy 

E{p) := / plogpdx- (14) 

JS. 

• nonlocal convection-diffusion equations |JK098[ lAGSOSi ICMV06) of the form 

dtp^AYvpV[U'{p) + V + W *p\, (15) 

where U , V, and W are given functions on M, E'', and M'*, respectively; 

• higher-order parabolic equations 01198, GOOH [UTa03. .MMS09, ,GST08j of the form 

atP = -divpV(p"-iAp"), (16) 

for 1/2 < a < 1; 

• moving-boundary problems, such as a prescribed-angle lubrication-approximation model |Ott98| 

dtp = -dxipdxxxP) in {p > 0} 

dxP = ±1 on d{p > 0}, 

and a model of crystal dissolution and precipitation |PP08| 
dtp = dxxP in {p > 0}, with 5„p = -pvn and u„ = f{p) on d{p > 0}. (18) 

4. The central statement 

The aim of this paper is to connect Jh to the functional in the limit /i — > 0, in the sense 
that 

M ■ ; Po) ^Khi ■ ; po) as/i^O. (19) 

For any p ^ po both J/i(p; po) and Kh{p] po) diverge as /i — > 0, however, and we therefore 
reformulate this statement in the form 

Jhi • ; Po) - j^di ■ , pof ^^Ei-)- ^Eipo). 



6 



STEFAN ADAMS, NICOLAS DIRR, MARK A. PELETIER, AND JOHANNES ZIMMER 



The precise statement is given in the theorem below. This theorem is probably true in greater 
generality, possibly even for all po,p & ^2(R'')- For technical reasons we need to impose restrictive 
conditions on po Si,nd p, and to work in one space dimension, on a bounded domain [0, L]. 

For any < 5 < 1 we define the set 



:= jpe L°°(0,i) ; ^ p = 1 and ||p - L^^ | 



< S 



Theorem 3. Let Jh be defined as in Fix L > 0; there exists 6 > with the following property. 
Let po e AsnC{[0,L]). Then 

Jki-;Po)-j^d{-,po)^^^Ei-)-^E{po) ash^O, (20) 

in the set As, where the arrow denotes Gamma- convergence with respect to the narrow topology. 
Ln this context this means that the two following conditions hold: 

(1) (Lower bound) For each sequence p^ ^ p in As, 

hniinf ;po) - jfld{p'\ pof > ^E{p) ~ ^E{po). (21) 

(2) (Recovery sequence) For each p € As, there exists a sequence (p^) C As with p^ p such 
that 

lim J„(p'^ ;po) - ^d{p\p^f = ]^E{p) - \e{p^). (22) 

5. Discussion 
There are various ways to interpret Theorem [3j 

An explanation of the functional I^h md the minimization problem ([s]). The authors of |JK098j 
motivate the minimization problem ([s]) by analogy with the well-known backward Euler approx- 
imation scheme. Theorem [3] provides an independent explanation of this minimization problem, 
as follows. By the combination of ([2| and (19), the value Kh{p; po) determines the probability of 



observing p at time h, given a distribution po at time zero. Since for large n only near-minimal 
values of Jh, and therefore of Kh, have non- vanishing probability, this explains why the minimizers 
of Kh arise. It also shows that the minimization problem ([3|, and specifically the combination 
of the entropy and the Wasserstein terms, is not just a mathematical construct but also carries 
physical meaning. 

A related interpretation stems from the fact that ^ characterizes not only the most probable 



state, but also the fluctuations around that state. Therefore Jh and by ( 19 ) also Kh not only 
carry meaning in their respective minimizers, but also in the behaviour away from the minimum. 
Put succinctly: Kh also characterizes the fluctuation behaviour of the particle system, for large 
but finite n. 

A microscopic explanation of the entropy- Wasserstein gradient flow. The diffusion equation ([I]) 
is a gradient flow in many ways simultaneously: it is the gradient flow of the Dirichlet integral 
I / I^Pp with respect to the L^ metric, of | / with respect to the H^^ metric; more generally, 
of the i?" semi-norm with respect to the iJ**"^ metric. In addition there is of course the gradient 
flow of the entropy E with respect to the Wasserstein metric. 

Theorem (|3| shows that among these the entropy- Wasserstein combination is special, in the 
sense that it not only captures the deterministic limit, i.e., equation Q, but also the fluctuation 
behaviour at large but finite n. Other gradient flows may also produce ([l| , but they will not capture 
the fluctuations, for this specific stochastic system. Of course, there may be other stochastic 
particle systems for which not the entropy- Wasserstein combination but another combination 
reproduces the fluctuation behaviour. 

There is another way to motivate the combination of entropy and the Wasserstein distance. 
In [KO90) the authors derive a rate functional for the time-continuous problem, which is therefore 
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a functional on a space of space-time functions such as C(0, oo; The relevant term for 

this discussion is 

/(p):=inf|/ / \v{x,t)\'^p{x,t)dxdt:dtp^Ap + diYpv 
" [Jo Jr'^ 

If we rewrite this infimum by u = w — Vp instead as 

inf < / / \w{x,t) ~V (log p + p{x,t) dxdt : dtp — diY pw 







then we recognize that this expression penalizes deviation of w from the variational derivative (or 
L^-gradient) logp + 1 of E. Since the expression J^^ \v\'^pdx can be interpreted as the derivative 
of the Wasserstein distance (see |Ott01| and |AGS05[ Ch. 8]), this provides again a connection 
between the entropy and the Wasserstein distance. 

The origin of the Wasserstein distance. The proof of Theorem [3] also allows us to trace back 
the origin of the Wasserstein distance in the limiting functional K^- It is useful to compare Jh 



and Kfi in a slightly different form. Namely, using ( 13 ) and the expression of H introduced in ( 25 1 
below, we write 

Mp;pa)^ inf \ E{q) - E{po) + \og2V^ + [[{x~yfq{x,y)dxdy\, (23) 
</er(po.p) Ah J J 

^ RxR 

1 11 1 f f 

7;Kh{p;po) ^ t;E{p) - -E{po) + — inf {x - y)^q{x,y) dxdy. 

2 2 2 Ah qeT{pa,p) J J 

RxR 

One similarity between these expressions is the form of the last term in both lines, combined with 
the minimization over q. Since that last term is prefixed by the large factor l/4/i, one expects it 
to dominate the minimization for small /i, which is consistent with the passage from the first to 
the second line. 



In this way the Wasserstein distance in Kh arises from the last term in ( 23 ) . Tracing back the 
origin of that term, we find that it originates in the exponent (x — yY /Ah in P'* (see ([5|), which 
itself arises from the Central Limit Theorem. In this sense the Wasserstein distance arises from 
the same Central Limit Theorem that provides the properties of Brownian motion in the first 
place. 

This also explains, for instance, why we find the Wasserstein distance of order 2 instead of 
any of the other orders. This observation also raises the question whether stochastic systems 
with heavy-tail behaviour, such as observed in fracture networks }BS98[ IBSSOO) or near the glass 
transition |WW02) . would be characterized by a different gradient-flow structure. 

A macroscopic description of the particle system as an entropic gradient flow. For the simple 
particle system under consideration, the macroscopic description by means of the diffusion equation 
is well known; the equivalent description as an entropic gradient flow is physically natural, but 
much more recent. The method presented in this paper is a way to obtain this entropic gradient 
flow directly as the macroscopic description, without having to consider solutions of the diffusion 
equation. This rigorous passage to a physically natural macroscopic limit may lead to a deeper 
understanding of particle systems, in particular in situations where the gradient flow formulation 
is mathematically more tractable. 

Future work. Besides the natural question of generalizing Theorem[3]to a larger class of probabil- 
ity measures, including measures in higher dimensions, there are various other interesting avenues 
of investigation. A first class of extensions is suggested by the many differential equations that 
can be written in terms of Wasserstein gradient flows, as explained in Section |3] can these also be 
related to large-deviation principles for well-chosen stochastic particle systems? Note that many 
of these equations correspond to systems of interacting particles, and therefore the large-deviation 
result of this paper will need to be generalized. 

Further extensions follow from relaxing the assumptions on the Brownian motion. Kramers' 
equation, for instance, describes the motion of particles that perform a Brownian motion in velocity 
space, with the position variable following deterministically from the velocity. The characterization 
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by Huang and Jordan [HuaOOi IHJOO] of this equation as a gradient flow with respect to a niod- 
ifed Wasserstein metric suggests a similar connection between gradient-flow and large-deviations 
structure. 



6. Outline of the arguments 

Since most of the appearances of h are combined with a factor 4, it is notationally useful to 
incorporate the 4 into it. We do this by introducing the new small parameter 

Ah, 

and we redeflne the functional of equation 

^K,{p;po) ^d{p,pof + ^E{p) - ^E{po), 

and analogously for ([t]) 



Mp;Po)-= inf H{q\qo), (24) 

qer{p„,p) 



where qo{dxdy) = po{dx)pe{x,y)dy, with 

in analogy to ([5| and ([s]). Note that 

Hiq\qa) ^ E{q) - // q{x,y)log[po{x)peix,y)] dxdy 



= E{q) ~ E{pa) + i loge^^ + ^ /Z^"" ~ yf<l{x^ y) dxdy, (25) 

RxK 

where we abuse notation and write E{q) — /jj^g q{x, y) log q{x, y) dxdy. 

6.1. Properties of the Wasserstein distance. We now discuss a few known properties of the 
Wasserstein distance. 

Lemma 4 (Kantorovich dual formulation |Vil03[ EGSOSl IVil08| ) ■ Letpo,pi € ^2(R) be absolutely 
continuous with respect to Lebesgue measure. Then 

d{pQ,Pi)'^ = sup< {x"^ -2(p{x))po{x)dx+ {y'^ - 2ip* {y))pi{y) dy : ip:R^Rconve 

(26) 

where ip* is the convex conjugate (Legendre-Fenchel transform) of tp, and where the supremum is 
achieved. In addition, at po-a.e. x the optimal function ip is twice differentiable, and 

P\W\x)) 



A similar statement holds for p* , 



i^*)"{y)^^r^$rYy (28) 

po{w){y)) 



For an absolutely continuous q E (M x M) we will often use the notation 

^(9)^ ■= / / ^ yf 9(2^, y) dxdy. 



Note that 

d{po,Pi) = inf{d(g) : ttq,!? = po,i}, 
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and that if TTo.ig — Po,ii ^-nd if the convex functions (/s, ip* are associated with (i(pojPi) as above, 
then the difference can be expressed as 

<^{(lf - dipo, Pif = (x- vY qix, y) dxdy - / / (a;^ - 2ip{x)) q{x, y) dxdy 



{y^ - 2(p*{y))q{x,y) dxdy 



j j {'^{x) + </5*(y) - a;?;) q{x, y) dxdy. 

KxK 



(29) 



6.2. Pair measures and q^. A central role is played by the following, explicit measure in ^2(1^ x 
E). For given po S A^i(M) and a sequence of absolutely continuous measures p'^ g A^i(M), we 
define the absolutely continuous measure g"^ S A^i(M x E) by 



¥{x,y) ■■= ^ ^ po[x)y/ {y) exp 



where the normalization constant is defined as 



Ze = Zeipo.P^) 



Vpo{x)\/pHy) 



exp 



;{xy-ip,{x) ~ ifliy)) 



; {xy - ipe{x)~ip* (y) ) dxdy 



(30) 



(31) 



In these expressions, the functions (/3£, (y9* are associated with d{po,p'^) as by Lemma|4] Note that 
the marginals of are not equal to po and p'^ , but they do converge (see the proof of part [2] of 
Theorem [3| to pa and the limit p of p'^ . 

6.3. Properties of g^ and Zg. The role of q'^ can best be explained by the following observations. 
We first discuss the lower bound, part [T] of Theorem [3j If q^ is optimal in the definition of 
Je(p^ ; Po) — implying that it has marginals po and p^ — then 

0<Hiq'\q')^E{q')- JJq'logq' 
= E{q')+ log Z. + Uoge^TT-^ 11 q' {x, y)[\og po{x) + log p'iy)] dxdy 



<f (a;, y) [pe [y] - xy] dxdy 



d29k 



E{q') - -E{po) ~ -Eip') + - [d{q')' - dipo, p^^] + log Z, + - log e\ 



J,{p- ■p^)-^d{po,p'f-\E{p- 



-£;(/7o)+logZ, 



(32) 



The lower-bound estimate 



hminf J,(p^Po) - \d{po,p'f > \e{p) - \e{po) 
then follows from the Lemma below, which is proved in Section [Sj 
Lemma 5. We have 

(1) liminf,^oi^(p') >£^(P); 

(2) limsup^^o^e < 1- 

For the recovery sequence, part[2]of Theorem|3j we first define the functional ; jMi(IE 
E by 



GM H{q\{T:oq)P') - -^d{^oq,T:^qf 
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Note that by ([251) ^^"^ (291, for any q such that n^q — pq we have 



G,{q)^Eiq)-Eipo) + ^logs^7r 



+ inf 



q{x,y){f{x) + Lp*{y) ~ xy) dxdy: (p 



convex 



(33) 



Now choose for ip the optimal convex function in the definition of d{pQ , p) , and let the function 
q^ be given by (301, where pf, ip^, and ip* are replaced by the fixed functions p, ip, and ip* . Define 

(34) 



the correction factor Xe S L^{t^q<1'^) by the condition 

Poix) = Xe{x)Troq''{x). 
We then set 

q''{x,y) = Xe{x)¥ix,y) 



1 



Xe{x)\/po{x)\/pi{y) 



exp 



;{xy~ip{x) - ip*iy)) 



(35) 



so that the first marginal TToq^ equals po; in Lemma [6] below we show that the second marginal 
converges to p. Note that the normalization constant above is the same as for q'^, i.e.. 



Vpo{x)\/pi{y) 



exp 



K JK 



;{xy-(p{x) - ip*{y)) 



dxdy. 



Since the functions ip and ip* are admissible for (i(7rog^, ttiq^), we find with (261 



c^(7^og^7^l9^) > / (x - 2ip{x))T:oq'' (x) dx + / (y ~ 2tp* (y)) niq^y) dy 



[x^ - 2(p{x) - 2(p*{y) +y^]q^{x,y)dxdy. 



Then 



GM)<E{q')~E{po) 



1 1 2 2 

-log £2^+^ 



q''{x,y){^{x) + ip*{y) - xy) dxdy 



1 

+ 2 

= -log^e 



'7''(2^,Z/)log Xe{x)dxdy 
{x, y) log pi{y) dxdy - l- 



Poix)logX6{x)dx 



q%x,y) log po{x)dxdy 



+ o / '^iq%y)'^ogpi{y)dy ~ - / po(x)logpo(a;)dx 



The property ( 22 ) then follows from the lower bound and Lemma below, which is proved in 
Section [71 

Lemma 6. We have 

(1) hme^o Z^ = l; 

(2) no^iq"^ and Xe ore hounded on (0, L) /rom above and away from zero, uniformly in e; 

(3) Xe^ 1 mLi(0,L); 

(4) TTig^ ^ pi in L^{0,L). 

7. Upper bound 

In this section we prove Lemma[6l and we place ourselves in the context of the recovery property, 
partial of Theorem |3l Therefore we are given po,pi € Ag with po G C{[0,L]), and as described 
in Section 6.3 we have constructed the pair measures q"^ and as in ([SSl; the convex function ip 



is associated with d{po, pi). The parameter S will be determined in the proof of the lower bound; 
for the upper bound it is sufficient that < S < 1/2, and therefore that 1/2 < po, pi < 3/2. Note 
that this implies that ip" and ip*" are bounded between 1/3 and 3. 
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By Aleksandrov's theorem |EG92[ Th. 6.4.1] the convex function ip* is twice difFerentiable at 
Lebesgue-almost every point y G M. Let C M be the set where ip is not differentiable; this is 
a Lebesgue null set. Let A^j^ C M be the set at which ip* is not twice difFerentiable, or at which 
((/?*)" does exist but vanishes; the first set of points is a Lebesgue null set, and the second is a 
pi-null set by (28); therefore pi{Ny) — 0. Now set 

N ^N^\J^p*{Ny)■ 
hs^:e d(p* is the (multi-valued) sub-differential of p* . Then po{N) < poiN^) + po{dp* {Ny)) = 
-I- po{d(p* (Ny)) = pi{Ny) = 0, where the second identity follows from }McC97[ Lemma 4.1]. 
Then, since p*'{p'{x)) = x, we have for any x £ R \ N, 

^*{y) = p*{^\x)) + x{y - p'{x)) + \p*"{p'{x)){y p' {x)f + o{[y - p'{x)f), 

so that, using p{x) + p*{p'{x)) = xp'{x), 

p{x) + p*{y) -xy^ ^p*"{p'{x)){y - ip'{x)f + o{{y - p'{x))'^). 

Therefore for each x eM. \ N the single integral 

r 2 



exp 



;{xy-p{x) - p*{y)) 



dy 



1 



y^exp ~-p*"{p'{x)){y~^'{x)f + o{e-\y-p'{x))^) dy 



can be shown by Watson's Lemma to converge to 

1 



By Fatou's Lemma, therefore. 



lim inf > 1. 



By the same argument as above, and using the lower bound p" > 1/3, we find that 

xy-p{x) ~ p*{y) < min|-^(a; - p*'{y) f,-^{y - p'ix))'^ 



(36) 
(37) 

(38) 



Then we can estimate 
1 



e 



\/poMVPi(y)exp 

L nL 



r 2 



{xy-p{x) - p*{y)) 



dxdy 



< 



1 



\/po{'p*'{y))Vpiiy) 



r 2 



exp 



Jo 

L rL 



{xy-p{x) - •p*{y)) 



dxdy 



^0 



\/po{x) - Vpo('^*'(y)) \/pi{y) 



exp 



3e2 



{x~p*'{y)f dxdy. (39) 



By the same argument as above, in the first term the inner integral converges at pi-almost every 
y to pi{y)\/TT and is bounded by 



1/2| 



Pi 



1/2 



exp 



-^{x- P*'{y))'] dx ^ ||po|lL/'IIPill^'^> 



so that 



lim - 



L i-L 

Jo 



-{xy - p{x) - p*{y)) dxdy 



Vpo('^5*'(y))Vpi(y)exp 

To estimate the second term we note that since p*' maps [0, L] to [0, L], we can estimate 
VPo(^*'(y)) <a;^(|x-^*'(y)l), foran(x,y)e[0,L]x[0,L], 



(40) 
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where w is the modulus of continuity of J~p^ G C([0, L]). Then 



VpoM - Vpo(<^*'(2/)) \/ P\{v) 



exp 



< -^yp?('?)IIPilli^' 



3e- 



exp 



\x ~ ip*\y)f 



dxdy 



\Po 



|1/2| 



Pi I 



J{a;6[0,L]:|a:-<^*'(y)|<j,} 
L 

1/2 



1 

3£ 



2(a::-v3*'(?/))2 dxdy 



J{2;e[0,L]:|a;-(p*'(y)|>»)} 

< c.^(r?)||pi||^2iV3^+ i||po||^'llpi||i/'i'exp 



exp 



3e2 



(a; - <^*'(y))' 



dxdy 



(41) 



The first term above can be made arbitrarily small by choosing rj > small, and for any fixed 
7] > the second converges to zero as e — >■ 0. Combining (37 1, (39 1, (40) and (41), we find the 
first part of Lemma [6} 

lim — 1. 

Continuing with part [2] of Lemma [6j we note that by (38), e.g., 



TToq'ix) < Z- 



ewTT 



VPoix) / Vpi(y) exp [- — (?/- <y9'(a:;))2j dy 



3£2 



<Z^^pomp^\\ll'V3. 



Since Zg — )■ 1, ttq^^ is uniformly bounded from above. A similar argument holds for the upper 
bound on ttiq^ , and by applying upper bounds on ip" and (p*" we also obtain uniform lower bounds 
on TToQ^ and ttiq'^. The boundedness of Xe then follows from (34) and the bounds on pq. 



We conclude with the convergence of the Xs and niq^ . By (36) and (40) we have for almost all 
xe (0,L), 



^{x) = Z-^^/M^)- 



1 



\/pi(y)exp -^{xy ~ ip{x) ~ ip*{y)) dy — > Pq{x), 



and the uniform bounds on ttoq^ imply that iroq'^ converges to po in L^{0,L). Therefore also 
Xe —?> 1 in L^{0,L). A similar calculation gives niq'^ — > pi in L^{0,L). This concludes the proof 
of Lemma [H □ 



Lower bound 



This section gives the proof of the lower-bound estimate, part [T] of Theorem [3) Recall that 
in the context of part [l] of Theor em [3{ we are given a fixed po £ Ag (1 C{[0,L]) and a sequence 
(/o^) C As with p. In Section 6.3 we described how the lower the lower-bound inequality (21 ) 
follows from two inequalities (see Lemma[5]). The first of these, liminfe_j.o i?(p^) > E{p), follows 
directly from the convexity of the functional E. 

The rest of this section is therefore devoted to the proof of the second inequality of Lemma [5] 

limsupZe<l. (42) 



Here Z^ is defined in (31 ) as 



1 



£\/7r 



exp 



-{xy - ipeix) - (filiy)) 



dxdy, 



where we extend po and by zero outside of [0,L], and ip^ is associated with (i(po,P^) as in 
Lemma |4j This implies among other things that (^^ is twice differentiable on [0,i], and 



Pojx) 

pHp'Ax)) 



for aU X e [0,L]. 



(43) 
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We restrict ourselves to the case L — 1, that is, to the interval K [0,1]; by a rescaling 
argument this entails no loss of generality. We will prove below that there exists aO<(5<l/3 
such that whenever 



max < Wpq - 1|1loc(^), sup 



^-1 
Pq 



<S, 



the inequality (42 1 holds. This implies the assertion of Lemma [s] and concludes the proof of 
Theorem [H 



8.1. Main steps. A central step in the proof is a reformulation of the integral defining in terms 
of a convolution. Upon writing y = ip'^{^) and x — ^ + ez, and using ipeiO + 'fitiVeiO) — ^VeiO^ 
we can rewrite the exponent in as 



= ^ae + + ^li^'AO) - (e + ezUio 

^(p,{^ + ez)-^,{0-ez^'^{0 
'2 / iz~s)ip':i^ + es)ds 



where we define the convolution kernel by 
(s) = e~-^K^(e~^s) and 



^{z + a) if - z < (T < 
n'{a) = { -^{z + a) ifO<cr<-z 
otherwise. 



(44) 
(45) 



(46) 





Figure 1. The function for negative and positive values of z. 



While the domain of definition of (44) is a convenient rectangle K = [0, 1] , after transforming 



to (45) this domain becomes an inconvenient e-dependent parallellogram in terms of z and ^. The 



following Lemma therefore allows us to switch to a more convenient setting, in which we work on 
the flat torus T = M/Z (for ^ and M (for z). 

Lemma 7. Set u £ L°°{T) to be the periodic function on the torus T such that m(^) = <y3"(^) for 
all € K (in particular, u > 0). There exists a function uj G C([0, oo)) with w(0) — 0, depending 
only on po, such that for all S < 1/3 



TrZ,< uj{e) + / poiOV^O / exp[-(K^ * u)iOz^] dzd^ 



Given this Lemma it is sufficient to estimate the integral above. To explain the main argument 
that leads to the inequality ( 42 1 , we give a heuristic description that is mathematically false but 
morally correct; this will be remedied below. 
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We approximate in an expression of the form e ° ''bye °(1 — 6) (let us call this perturbation 
1), and we set po = 1 (perturbation 2). Then 



V<0 



dzdS, 



[l- {nl *u-u){Cjz^] dzd^ 



The first term can be calculated by setting C — zy^u{^), 

[ [ e~^'dCd^= [ ^di^V^. 
Jt Jm. Jt 

In the second term we approximate (k^ *m)(^) — m(^) by CM"(^)e^z^, where c = s^K^(s)ds (this 
is perturbation 3). Then this term becomes, using the same transformation to C as above, 

""(e) 



-ce 



m"(Oz^ dzd^ = -ce' 



-2ce' 



CdCd^ 



/nd^. 



(47) 



Therefore this term is negative and of order e as e — 0, and the inequality (42 1 follows. 

The full argument below is based on this principle, but corrects for the three perturbations 
made above. Note that the difference 



1-6 



(48) 



is positive, so that the ensuing correction competes with (47). In addition, both the beneficial 
contribution from (47) and the detrimental contribution from (48) are of order e^. The argument 
only works because the corresponding constants happen to be ordered in the right way, and then 
only when ||m — l||oo is small. This is the reason for the restriction represented by 6. 

8.2. Proof of Lemma [rj Since 5 < 1/3, then (43) implies that ip'^ is Lipschitz on K, and we can 
transform following the sequence (|44[)-(46 ), and using supppoiP'^ — K: 



Vtt Z^ ^ 



K 



exp 



K 



;{xy - ipe{x) - ifliy)) dxdy 



K 



K 



TMO VWM / VPo{^ + ez)exp[-{Kl * v':mz'] dzd^, 



where we used (43) in the last line. 

Note that (k^ * <p")(0-z^ = (k^ * u){C)z^ for aU z G M and for all ^ € if", where K^' is the 
interval K from which an interval of length ez has been removed from the left (if z < 0) or from 
the right (if z > 0). Therefore 



V^Z,- I poiO^MO / exp[-«*u)(e)z2]dzdC 
Jm 

\/po{OVHO(\/po{^ + £z) - ^^p(^) exp[-(K^ * u){Oz^] d^dz 



K\K' 



K\K^ 



\/po{0\/W}Vpo{^ + £z) cM-K * u){Oz'^] d^dz 
PoiOVui^f'M-K * d^dz. 
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The final term is negative and we discard it. Prom the assumption 6 < 1/2 we deduce — l||oo < 
1/2, so that the first term on the right-hand side can be estimated fi-om above (in terms of the 
modulus of continuity cup^ of po) by 



which converges to zero as e — > 0, with a rate of convergence that depends only on po- Similarly, 
the middle term we estimate by 

IIpoIUoo(k) Wutll^K) \K \ K"\e-'^"l^ dz < {^y^'e \z\e-'"/^ dz, 
which converges to zero as e — )• 0. □ 

8.3. The semi-norm || • It is convenient to introduce a specific semi-norm for the estimates 
that we make below, which takes into account the nature of the convolution expressions. On the 
torus T we define 

iH^= 5:^^(1-6-^'=^^^), 

fcez 

where the Uk are the Fourier coefficients of u, 

u{x) = Y,Uke^''''"'- 
feez 

The following Lemmas give the relevant properties of this seminorm. 
Lemma 8. For £ > 0, 

/ e-^' I {u{x + ez)-u{x) fdxdz = 2VTT\\u\\l. (49) 

Lemma 9. For e > 0, 

e-'^^uix) - * u{x)fz'^ dxdz < ^\/7r||u||^. (50) 



Lemma 10. For a > and e > 0, 

[t\m\e ^/0<a<l, 
where \\ ■ should be interpreted as \\ ■ ||e with e replaced by e/a. 
The proofs of these results are given in the appendix. 

8.4. Conclusion. To alleviate notation we drop the caret from 6 and simply write 6. Following 
the discussion above we estimate 



PoiOVW) / eM-i^^y^m^dzd^^ / / Po(Oy^e-"«)^ dzd^ 
Jr Jt Jr 

+ 11 Po(0 [«(0 - * u^z^ dzd^ + R, (52) 
Jt Jr. 

where 

dzd^, 



R = I I Po(0\/^Oe-"^«^^' [exp[(«(C) - < * u{0)z^] - 1 - {u{0 - < * «(0)^' 

< (1 + <5)3/2 / / e-"«)^' [exp[(w(0 - * u{0)z^] - 1 - {u{0 - < * "(O)^'! dzd^- 
Jt Jr J 

Since ||u — l||i:,»(T) < ^) we have ||u — k^* 'w||l«(t) < 2^ and therefore 



exp[(n(0 - i^t * ^(0)^'] - 1 - HO - < * < \e'''^\<0 - < * "(0)'^', 
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SO that 

^3/2 



2 jt -/r 

(1+^ / / e^-^+''^^\u{^)-.l.u{OfzUzd^. 



2 

< 

2 

Setting a = Vl — 3(5 and C = a^, we find 



Noting that k^^" = we have with e :— e/a — e{l — 36) ^1"^ 



J50l (1 + (5)3/2 5 



2(1-3(5)5/2 6 

<P (1 + ^)^/^ 5 2 
- 2(1-3J)V2 6^II"I1- 



(53) 



We next calculate 

Po(0 V^e-"(«)^' dzde = / Po(e) / « = / Po(e) = %/^- (54) 



Finally we turn to the term 



T JR 



Lemma 11. Let e > 0, let po G L°°(T) n C([0, 1]) mt/i po = 1, and let u e L°°(T). Recall that 
< (5 < 1/3 mi/i 

IIpo - 1||l~(t) < (5 and ||m - l||i^(T) < (5. 

Then 

1 l-<5 ^„ „2 

^-"2(TT^^"""^ + ''- 

where — uniformly in 5. 

From this Lemma and the earlier estimates the result follows. Combining Lemma [t] with (52), 
(54), Lemma 11 and (53), 



- 2 (TT^^"""- + (i-3(5)V2 + 

where S'^ = aj(e) +rj converges to zero as e — ?> 0, uniformly in (5. Since 1/2 > 5/12, for sufficiently 
small (5 > the two middle terms add up to a negative value. Then it follows that limsup^^^Q < 
1. 

Proof of Lemma \11\ Writing / as 



1 = 2 PoiOV^ / / {z-<j){u{0-u{^ + ea))dadzdC, 

we apply Fubini's Lemma in the (z,cr)-plane to find 

/• /'OO /"OO 

/ = -2 / poiO \M0 / / e-"(«)"' (z - a) [w(e + ea) - 2u{0 + ~ ea)] dzdad^ 
Jt Jo Ja 

= -2 f a [ poiO + ea) - 2u(0 + - ea)] h{a^u{0) d^da, 
Jo Jt 
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where 



Since ||u — l||oo < <5, 



h{s) 



h'{a'u) 



-1 .-u.^ < -1 1 



(55) 



(56) 



4o-3y3/2 ' - 4^3 (1 _^ ^)3/2 

Then, writing D^af{^) for /(^ + ecr) — /(O, we have 

/ Po(0 [w(e + £<J) - MO + "(^ - h{a\{0) = 

= - / PoiODeMODeaHa^md^- I D.^poiOD.MOH^^^^ + d^ 

SO that 

1 = 2 a po{ODeMODeah{<J^umd^d<j + 2 / a / D.^poiODeauiOHaM^ + £<^)) d^da 
Jo Jt Jo Jt 

^Ia+ lb- 



Taking Ih first, we estimate one part of this integral with (551 by 

poo pi — 6(7 



2 (J D,„po{ODeau{Oh{a^u{^ + ea))d^da 

Jo Jo 

POO 1 

< 2 / aoj,, {ea) 25 e'^^'^)-^ da 

Jo 2ay/l - d 

u;p„(ea)e-(i-^)"' da, 



25 



and this converges to zero as e — !• uniformly in < (5 < 1/3. The remainder of If, we estimate 

j*00 1*1 

2 a D,,po{ODeau{Oh{a\{^ + ea))d^da 

Jo Jl-ea 



< 2 / ecr^ 25 
Jo 
2e5 



2a^/Y^5 
, , ae-(i-^)-' da, 

which again converges to zero as £ — > 0, uniformly in 5. 
To estimate la we note that by (561 and the chain rule, 

1 



e-(i-^)'^' da 



and thus 



1-5 



1 



4a3 (1 + <5)3/2 



2(1 + 5)3/2 7o 
1-5 



-(l+S)a^ 



{D.^uiOfd^da 



2{l + Sr Jo 

& ^11 Il2 



2(1 + 5)2 

1-5 
2(1 + 5)2' 



^\\u\\[ 



□ 
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Appendix A. Proofs of the Lemmas in Section 18.31 



Proof of Lemma^ Since the left and right-hand sides are both quadratic in u, it is sufficient to 
prove the lemma for a single Fourier mode u{x) — exp 2TTikx, for which 

{u{x + ez) ~ u^x))"^ dxdz = / \exp2TTikez — dz 



since 



= 2 e (1 — cos2TTkez) dz 
Jr 

, 2 J 2 2 

= 2V^(1 -e-" " ), 

_2 I _ 2 _2/. 

e ^ c?z = VTT and / e ^ cosujzdz = y'lT e " 



□ 



Proof of Lemma^ Again it is sufficient to prove the lemma for a single Fourier mode u{x) 
exp2Trikx, for which 

e'^^ {u{x) - * u{x))^z^ dxdz = / e~''^z**|l - dz. 
M Jt Jr 

Writing u 27rfce, the Fourier transform of on T is calculated to be 

;^(fc) = f (a;)e-2-''=- dx = ^ [e'"" - 1 - lUJz] . 



Then 



so that 



1 — iujz 



to z 



4 



z^\l — K^{k)\'^ ^ ^ (l — coscliz — ) + {sinujz — Loz) 



4 4 

W Z . 2 2 

2 — zcosojzH ; 2ujzsmujz + u) z cos uiz 



We then calculate 



e ^ z -a/tt 
4 

e^^ coswz dz = -v/tt e^"^ 



e ^ z sin wzdz = — Vtt e 
2 ^ 



cos ojz dz = ^ e^'^ ^'^ 



2 4 



implying that 

e-^%''|l-;J|(fc)pdz 



w4 



2 - 2e-^/4 + _ ^2^-.V4 + ^2^-.V4(^ _ ^\ 

16 V 2 4 / 

16 2 4 



We conclude the lemma by showing that the right-hand side is bounded from above by 

5 



6 



Indeed, subtracting the two we find 



4v^ 



16 



24 
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and setting s :— uj /A the sign of this expression is determined by 

^ ' i 3 

This function is zero at s = 0, and its derivative is 

which is negative for all s > by the inequality e^^(l + s) < 1. □ 

Proof of Lemma^W^ Since the function a i-> 1 — e"'^^''^^^/"" is decreasing in a, the first inequality 
follows immediately. To prove the second it is sufficient to show that 1 — e~^'^ < /3(1 — e~^) for 
/3 > 1 and x > 0, which can be recognized by differentiating both sides of the inequality. □ 
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